Closing the DBTL loop with TeselaGen

With TeselaGen's platform you can close the Design-Build-Test-Learn (DBTL) cycle using machine learning algorithms that automatically learn from your data. The DISCOVER module is capable of suggesting new candidates that can optimize your results given your previous experimental rounds. This document shows how to enable those candidates as new designs at the DESIGN module to perform the next DBTL cycle.

Inputs: Evolutions algorithm's result at an DISCOVER module instance

Outputs: New designs created at DESIGN module

Requirements:

  • Access permissions to the lab where the evolutions results are stored
  • Have Python3 installed in your local computer with Pandas and TG's api-client

First, we start making all required imports

In [1]:
import platform
from IPython.core.display import display, HTML
import pandas as pd

from teselagen.api import DISCOVERClient, DESIGNClient
from teselagen.utils.candidates_to_design import build_design_from_candidates

print(f"python version     : {platform.python_version()}")
print(f"pandas version     : {pd.__version__}")
python version     : 3.6.9
pandas version     : 1.1.4

Look for your Evolution results

Here, the concept of closing the DBTL loop refers to the ability to generate designs out of what was learned from previous experiments. Those designs can be used to conduct new experimental rounds. This notebook assumes you've already trained an Evolution model.

The results of an Evolution model contain a set of ranked candidates that may outperform your current measurements. Each of the proposed candidates is a combination of the parts (and possibly other variables) you have already tested within the designs in your experiments. These new combinations were evaluated and ranked by a machine learning algorithm and we will generate proper designs with them.

This guide starts at the output of the Evolutions tool at DISCOVER. The next cell connects the notebook with DISCOVER and selects the empty lab (Common) which holds our sample experiment:

In [2]:
# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
# client = EVOLVEClient(host_url="https://your-instance-name.teselagen.com")
client = DISCOVERClient()
client.login()
client.select_laboratory()
Connection Accepted
Received None lab identifiers
Selected Common Lab

Next, we find the evolutive model with name Teselagen Example Evolutive Model:

In [3]:
search_for_name = "Teselagen Example Evolutive Model"
evolution_models_info = client.get_models_by_type('evolutive')
model_id = -1
for info in evolution_models_info:
    if info['name'] == search_for_name:
        model_id = info['id']
        print(f"Model id {info['id']}, name: {info['name']}")
if model_id == -1:
    raise IOError("Didn't found model")
Model id 65, name: Teselagen Example Evolutive Model

And get the models' results. The results objects contain predictions for several untested combinations. We will focus on the rows with valid priority values, which are the better candidates suggested by the algorithm:

In [4]:
results = client.get_model_datapoints(model_id='65', datapoint_type="output", batch_size=400,batch_number=1)
data = pd.DataFrame([el['datapoint'] for el in results['data']])
data = data.dropna(subset=['priority']).reset_index(drop=True)
display(data)
Teselagen Enzyme A Teselagen Enzyme B Production prediction sigma acq in_batch priority
0 Variant A1 Variant B5 NaN 6.544949 2.851464 0.290311 True 0.0
1 Variant A4 Variant B3 NaN 6.224357 2.517428 0.160054 True 1.0
2 Variant A5 Variant B4 NaN 6.179204 2.372436 0.125915 True 2.0
3 Variant A1 Variant B3 NaN 4.842277 3.142049 0.127976 True 3.0
4 Variant A0 Variant B5 NaN 5.172261 3.389486 0.208031 True 4.0
5 Variant A3 Variant B5 NaN 7.085248 2.010375 0.168389 True 5.0
6 Variant A4 Variant B4 NaN 6.287957 2.238001 0.112734 True 6.0
7 Variant A5 Variant B1 NaN 4.789871 2.664990 0.059986 True 7.0
8 Variant A4 Variant B2 NaN 4.678120 2.237455 0.020480 True 8.0
9 Variant A2 Variant B3 NaN 5.693773 2.384573 0.082872 True 9.0

Note the algorithm doesn't suggest candidates you've already tested. That's why the Production column, the unknown variable for untested combinations in this example, contains only NaN values.

Build the designs json

Now we need to generate a json file with the candidates in order to be imported from DESIGN. We've added an utility for this at the api-client library that is called build_design_from_candidates. This utility receives a list of dictionaries as input and it requires to explicitly declare the columns that should be interpreted as bins. Following with the example:

In [5]:
design = build_design_from_candidates(
    candidates_data = data.to_dict(orient="records"),
    bin_cols = ['Teselagen Enzyme A', 'Teselagen Enzyme B'],
    name = "Closing DBTL Example",
    priority_col='priority'
)
Generating design using 10 candidates

The design variable contains a dictionary representation of the design. This representation can be easily stored as a json file and then uploaded into DESIGN. To do this, we need to create a DESIGNClient instance:

In [6]:
design_client = DESIGNClient(host_url = client.host_url)
design_client.select_laboratory()
Received None lab identifiers
Selected Common Lab

And upload the design. The method post_design returns the id of the generated DESIGN in case of success:

In [7]:
response = design_client.post_design(design=design)
display(response)
Connection Accepted
{'id': '1215'}

The new design should be created and look like this:

Uncomment and run the following cell to get the design link:

In [8]:
# design_url = f"{design_client.host_url}/design/client/designs/{response['id']}"
# display(HTML(f"""<a href="{design_url}">{design_url}</a>"""))